AITopics | angular update

Collaborating Authors

angular update

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

326a8c055c0d04f5b06544665d8bb3ea-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 09:39:30 GMT

artificial intelligence, equilibrium, machine learning, (15 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.98)

Add feedback

326a8c055c0d04f5b06544665d8bb3ea-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 03:56:35 GMT

angular update, effective learning rate, weight norm, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

SphericalMotionDynamics

Neural Information Processing SystemsFeb-8-2026, 03:56:31 GMT

Then dynamics ofwt is like a physical process - a satellite's motion around the earth (see illustration in Fig.1): according to

artificial intelligence, equilibrium, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia > China (0.04)

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback

Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training

Neural Information Processing SystemsFeb-7-2026, 08:54:51 GMT

Learning Rate Warmup is a popular heuristic for training neural networks, especially at larger batch sizes, despite limited understanding of its benefits.

large language model, machine learning, warmup, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Switzerland (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Spherical Motion Dynamics: Learning Dynamics of Normalized Neural Network using SGD and Weight Decay

Neural Information Processing SystemsDec-23-2025, 23:18:30 GMT

In this paper, we comprehensively reveal the learning dynamics of normalized neural network using Stochastic Gradient Descent (with momentum) and Weight Decay (WD), named as Spherical Motion Dynamics (SMD). Most related works focus on studying behavior of equilibrium state, i.e. assuming weight norm remains unchanged. However, their discussion on why this equilibrium can be reached is either absent or less convincing. Our work directly explores the cause of equilibrium, as a special state of SMD. Specifically, 1) we introduce the assumptions that can lead to equilibrium state in SMD, and prove equilibrium can be reached in a linear rate regime under given assumptions; 2) we propose ``angular update as a substitute for effective learning rate to depict the state of SMD, and derive the theoretical value of angular update in equilibrium state; 3) we verify our assumptions and theoretical results on various large-scale computer vision tasks including ImageNet and MSCOCO with standard settings. Experiment results show our theoretical findings agree well with empirical observations. We also show that the behavior of angular update in SMD can produce interesting effect to the optimization of neural network in practice.

dynamic, normalized neural network, spherical motion dynamic, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.54)

Add feedback

059445c2d5b3ef918079851628fef1d6-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 17:34:55 GMT

gradient, update size, warmup, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Switzerland (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training

Kosson, Atli, Messmer, Bettina, Jaggi, Martin

arXiv.org Artificial IntelligenceOct-31-2024

Learning Rate Warmup is a popular heuristic for training neural networks, especially at larger batch sizes, despite limited understanding of its benefits. Warmup decreases the update size $\Delta \mathbf{w}_t = \eta_t \mathbf{u}_t$ early in training by using lower values for the learning rate $\eta_t$. In this work we argue that warmup benefits training by keeping the overall size of $\Delta \mathbf{w}_t$ limited, counteracting large initial values of $\mathbf{u}_t$. Focusing on small-scale GPT training with AdamW/Lion, we explore the following question: Why and by which criteria are early updates $\mathbf{u}_t$ too large? We analyze different metrics for the update size including the $\ell_2$-norm, resulting directional change, and impact on the representations of the network, providing a new perspective on warmup. In particular, we find that warmup helps counteract large angular updates as well as a limited critical batch size early in training. Finally, we show that the need for warmup can be significantly reduced or eliminated by modifying the optimizer to explicitly normalize $\mathbf{u}_t$ based on the aforementioned metrics.

arxiv, update size, warmup, (13 more...)

arXiv.org Artificial Intelligence

2410.23922

Country:

North America > United States (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Switzerland (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Spherical Motion Dynamics: Learning Dynamics of Normalized Neural Network using SGD and Weight Decay

Neural Information Processing SystemsOct-9-2024, 23:57:10 GMT

In this paper, we comprehensively reveal the learning dynamics of normalized neural network using Stochastic Gradient Descent (with momentum) and Weight Decay (WD), named as Spherical Motion Dynamics (SMD). Most related works focus on studying behavior of effective learning rate" inequilibrium" state, i.e. assuming weight norm remains unchanged. However, their discussion on why this equilibrium can be reached is either absent or less convincing. Our work directly explores the cause of equilibrium, as a special state of SMD. Specifically, 1) we introduce the assumptions that can lead to equilibrium state in SMD, and prove equilibrium can be reached in a linear rate regime under given assumptions; 2) we propose angular update" as a substitute for effective learning rate to depict the state of SMD, and derive the theoretical value of angular update in equilibrium state; 3) we verify our assumptions and theoretical results on various large-scale computer vision tasks including ImageNet and MSCOCO with standard settings. Experiment results show our theoretical findings agree well with empirical observations.

normalized neural network, sgd and weight decay, spherical motion dynamic, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.63)

Add feedback

Spherical Motion Dynamics: Learning Dynamics of Neural Network with Normalization, Weight Decay, and SGD

Wan, Ruosi, Zhu, Zhanxing, Zhang, Xiangyu, Sun, Jian

arXiv.org Machine LearningOct-2-2020

In this work, we comprehensively reveal the learning dynamics of neural network with normalization, weight decay (WD), and SGD (with momentum), named as Spherical Motion Dynamics (SMD). Most related works study SMD by focusing on "effective learning rate" in "equilibrium" condition, where weight norm remains unchanged. However, their discussions on why equilibrium condition can be reached in SMD is either absent or less convincing. Our work investigates SMD by directly exploring the cause of equilibrium condition. Specifically, 1) we introduce the assumptions that can lead to equilibrium condition in SMD, and prove that weight norm can converge at linear rate with given assumptions; 2) we propose "angular update" as a substitute for effective learning rate to measure the evolving of neural network in SMD, and prove angular update can also converge to its theoretical value at linear rate; 3) we verify our assumptions and theoretical results on various computer vision tasks including ImageNet and MSCOCO with standard settings. Experiment results show our theoretical findings agree well with empirical observations.

artificial intelligence, equilibrium condition, machine learning, (17 more...)

arXiv.org Machine Learning

2006.08419

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback